{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# What is machine learning, and how does it work? ([video #1](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1))\n",
    "\n",
    "Created by [Data School](http://www.dataschool.io/). Watch all 9 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Machine learning](images/01_robot.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Agenda\n",
    "\n",
    "- What is machine learning?\n",
    "- What are the two main categories of machine learning?\n",
    "- What are some examples of machine learning?\n",
    "- How does machine learning \"work\"?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is machine learning?\n",
    "\n",
    "One definition: \"Machine learning is the semi-automated extraction of knowledge from data\"\n",
    "\n",
    "- **Knowledge from data**: Starts with a question that might be answerable using data\n",
    "- **Automated extraction**: A computer provides the insight\n",
    "- **Semi-automated**: Requires many smart decisions by a human"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What are the two main categories of machine learning?\n",
    "\n",
    "**Supervised learning**: Making predictions using data\n",
    "    \n",
    "- Example: Is a given email \"spam\" or \"ham\"?\n",
    "- There is an outcome we are trying to predict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Spam filter](images/01_spam_filter.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Unsupervised learning**: Extracting structure from data\n",
    "\n",
    "- Example: Segment grocery store shoppers into clusters that exhibit similar behaviors\n",
    "- There is no \"right answer\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Clustering](images/01_clustering.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How does machine learning \"work\"?\n",
    "\n",
    "High-level steps of supervised learning:\n",
    "\n",
    "1. First, train a **machine learning model** using **labeled data**\n",
    "\n",
    "    - \"Labeled data\" has been labeled with the outcome\n",
    "    - \"Machine learning model\" learns the relationship between the attributes of the data and its outcome\n",
    "\n",
    "2. Then, make **predictions** on **new data** for which the label is unknown"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Supervised learning](images/01_supervised_learning.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The primary goal of supervised learning is to build a model that \"generalizes\": It accurately predicts the **future** rather than the **past**!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Questions about machine learning\n",
    "\n",
    "- How do I choose **which attributes** of my data to include in the model?\n",
    "- How do I choose **which model** to use?\n",
    "- How do I **optimize** this model for best performance?\n",
    "- How do I ensure that I'm building a model that will **generalize** to unseen data?\n",
    "- Can I **estimate** how well my model is likely to perform on unseen data?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resources\n",
    "\n",
    "- Book: [An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/) (section 2.1, 14 pages)\n",
    "- Video: [Learning Paradigms](http://work.caltech.edu/library/014.html) (13 minutes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comments or Questions?\n",
    "\n",
    "- Email: <kevin@dataschool.io>\n",
    "- Website: http://dataschool.io\n",
    "- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       "    @font-face {\n",
       "        font-family: \"Computer Modern\";\n",
       "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
       "    }\n",
       "    div.cell{\n",
       "        width: 90%;\n",
       "/*        margin-left:auto;*/\n",
       "/*        margin-right:auto;*/\n",
       "    }\n",
       "    ul {\n",
       "        line-height: 145%;\n",
       "        font-size: 90%;\n",
       "    }\n",
       "    li {\n",
       "        margin-bottom: 1em;\n",
       "    }\n",
       "    h1 {\n",
       "        font-family: Helvetica, serif;\n",
       "    }\n",
       "    h4{\n",
       "        margin-top: 12px;\n",
       "        margin-bottom: 3px;\n",
       "       }\n",
       "    div.text_cell_render{\n",
       "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
       "        line-height: 145%;\n",
       "        font-size: 130%;\n",
       "        width: 90%;\n",
       "        margin-left:auto;\n",
       "        margin-right:auto;\n",
       "    }\n",
       "    .CodeMirror{\n",
       "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
       "    }\n",
       "/*    .prompt{\n",
       "        display: None;\n",
       "    }*/\n",
       "    .text_cell_render h5 {\n",
       "        font-weight: 300;\n",
       "        font-size: 16pt;\n",
       "        color: #4057A1;\n",
       "        font-style: italic;\n",
       "        margin-bottom: 0.5em;\n",
       "        margin-top: 0.5em;\n",
       "        display: block;\n",
       "    }\n",
       "\n",
       "    .warning{\n",
       "        color: rgb( 240, 20, 20 )\n",
       "        }\n",
       "</style>\n",
       "<script>\n",
       "    MathJax.Hub.Config({\n",
       "                        TeX: {\n",
       "                           extensions: [\"AMSmath.js\"]\n",
       "                           },\n",
       "                tex2jax: {\n",
       "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
       "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
       "                },\n",
       "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
       "                \"HTML-CSS\": {\n",
       "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
       "                }\n",
       "        });\n",
       "</script>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from IPython.core.display import HTML\n",
    "def css_styling():\n",
    "    styles = open(\"styles/custom.css\", \"r\").read()\n",
    "    return HTML(styles)\n",
    "css_styling()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
